Goto

Collaborating Authors

 Oceania Government


Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation

arXiv.org Artificial Intelligence

In Australia, post-contact language varieties, including creoles and local varieties of international languages, emerged as a result of forced contact between Indigenous communities and English speakers. These contact varieties are widely used, yet are poorly supported by language technologies. This gap presents barriers to participation in civil and economic society for Indigenous communities using these varieties, and reproduces minoritisation of contemporary Indigenous sociolinguistic identities. This paper concerns three questions regarding this context. First, can speech technologies support speakers of Australian Aboriginal English, a local indigenised variety of English? Second, what risks are inherent in such a project? Third, what technology development practices are appropriate for this context, and how can researchers integrate meaningful community participation in order to mitigate risks? We argue that opportunities do exist -- as well as risks -- and demonstrate this through a case study exploring design practices in a real-world project aiming to improve speech technologies for Australian Aboriginal English. We discuss how we integrated culturally appropriate and participatory processes throughout the project. We call for increased support for languages used by Indigenous communities, including contact varieties, which provide practical economic and socio-cultural benefits, provided that participatory and culturally safe practices are enacted.


Unveiling AI's Threats to Child Protection: Regulatory efforts to Criminalize AI-Generated CSAM and Emerging Children's Rights Violations

arXiv.org Artificial Intelligence

This paper aims to present new alarming trends in the field of child sexual abuse through imagery, as part of SafeLine's research activities in the field of cybercrime, child sexual abuse material and the protection of children's rights to safe online experiences. It focuses primarily on the phenomenon of AI-generated CSAM, sophisticated ways employed for its production which are discussed in dark web forums and the crucial role that the open-source AI models play in the evolution of this overwhelming phenomenon. The paper's main contribution is a correlation analysis between the hotline's reports and domain names identified in dark web forums, where users' discussions focus on exchanging information specifically related to the generation of AI-CSAM. The objective was to reveal the close connection of clear net and dark web content, which was accomplished through the use of the ATLAS dataset of the Voyager system. Furthermore, through the analysis of a set of posts' content drilled from the above dataset, valuable conclusions on forum members' techniques employed for the production of AI-generated CSAM are also drawn, while users' views on this type of content and routes followed in order to overcome technological barriers set with the aim of preventing malicious purposes are also presented. As the ultimate contribution of this research, an overview of the current legislative developments in all country members of the INHOPE organization and the issues arising in the process of regulating the AI- CSAM is presented, shedding light in the legal challenges regarding the regulation and limitation of the phenomenon.


Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study

arXiv.org Artificial Intelligence

In recent years, Large Language Models (LLMs) have shown great potential across a wide range of legal tasks. Despite these advances, mitigating hallucination remains a significant challenge, with state-of-the-art LLMs still frequently generating incorrect legal references. In this paper, we focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical. We compare several approaches: prompting general purpose and law-specialised LLMs, retrieval-only pipelines with both generic and domain-specific embeddings, task-specific instruction-tuning of LLMs, and hybrid strategies that combine LLMs with retrieval augmentation, query expansion, or voting ensembles. Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training. In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings. We also highlight that database granularity along with the type of embeddings play a critical role in the performance of retrieval systems. Among retrieval-based approaches, hybrid methods consistently outperform retrieval-only setups, and among these, ensemble voting delivers the best result by combining the predictive quality of instruction-tuned LLMs with the retrieval system.


Spatioformer: A Geo-encoded Transformer for Large-Scale Plant Species Richness Prediction

arXiv.org Artificial Intelligence

Earth observation data have shown promise in predicting species richness of vascular plants ($\alpha$-diversity), but extending this approach to large spatial scales is challenging because geographically distant regions may exhibit different compositions of plant species ($\beta$-diversity), resulting in a location-dependent relationship between richness and spectral measurements. In order to handle such geolocation dependency, we propose Spatioformer, where a novel geolocation encoder is coupled with the transformer model to encode geolocation context into remote sensing imagery. The Spatioformer model compares favourably to state-of-the-art models in richness predictions on a large-scale ground-truth richness dataset (HAVPlot) that consists of 68,170 in-situ richness samples covering diverse landscapes across Australia. The results demonstrate that geolocational information is advantageous in predicting species richness from satellite observations over large spatial scales. With Spatioformer, plant species richness maps over Australia are compiled from Landsat archive for the years from 2015 to 2023. The richness maps produced in this study reveal the spatiotemporal dynamics of plant species richness in Australia, providing supporting evidence to inform effective planning and policy development for plant diversity conservation. Regions of high richness prediction uncertainties are identified, highlighting the need for future in-situ surveys to be conducted in these areas to enhance the prediction accuracy.


Bushfire Severity Modelling and Future Trend Prediction Across Australia: Integrating Remote Sensing and Machine Learning

arXiv.org Artificial Intelligence

Bushfire is one of the major natural disasters that cause huge losses to livelihoods and the environment. Understanding and analyzing the severity of bushfires is crucial for effective management and mitigation strategies, helping to prevent the extensive damage and loss caused by these natural disasters. This study presents an in-depth analysis of bushfire severity in Australia over the last twelve years, combining remote sensing data and machine learning techniques to predict future fire trends. By utilizing Landsat imagery and integrating spectral indices like NDVI, NBR, and Burn Index, along with topographical and climatic factors, we developed a robust predictive model using XGBoost. The model achieved high accuracy, 86.13%, demonstrating its effectiveness in predicting fire severity across diverse Australian ecosystems. By analyzing historical trends and integrating factors such as population density and vegetation cover, we identify areas at high risk of future severe bushfires. Additionally, this research identifies key regions at risk, providing data-driven recommendations for targeted firefighting efforts. The findings contribute valuable insights into fire management strategies, enhancing resilience to future fire events in Australia. Also, we propose future work on developing a UAV-based swarm coordination model to enhance fire prediction in real-time and firefighting capabilities in the most vulnerable regions.


Portal needed for victims to report AI deepfakes, federal police union says

The Guardian

A one-stop portal for victims to report AI deepfakes to police should be established, the federal police union has said, lamenting that police were forced to "cobble together" laws to charge the first person to face prosecution for spreading deepfake images of womenlast year. The attorney general, Mark Dreyfus, introduced legislation in parliament in June that will create a new criminal offence of sharing, without consent, sexually explicit images that have been digitally created using artificial intelligence or other forms of technology. The Australian Federation Police Association (Afpa) supports the bill, arguing in a submission to a parliamentary inquiry that the current law is too difficult for officers to use. They pointed to the case of a man who was arrested and charged in October last year for allegedly sending deepfake imagery to Brisbane schools and sporting associations. The eSafety commissioner separately launched proceedings against the man over his failure to remove "intimate images" of several prominent Australians last year from a deepfake pornography website.


SecGenAI: Enhancing Security of Cloud-based Generative AI Applications within Australian Critical Technologies of National Interest

arXiv.org Artificial Intelligence

The rapid advancement of Generative AI (GenAI) technologies offers transformative opportunities within Australia's critical technologies of national interest while introducing unique security challenges. This paper presents SecGenAI, a comprehensive security framework for cloud-based GenAI applications, with a focus on Retrieval-Augmented Generation (RAG) systems. SecGenAI addresses functional, infrastructure, and governance requirements, integrating end-to-end security analysis to generate specifications emphasizing data privacy, secure deployment, and shared responsibility models. Aligned with Australian Privacy Principles, AI Ethics Principles, and guidelines from the Australian Cyber Security Centre and Digital Transformation Agency, SecGenAI mitigates threats such as data leakage, adversarial attacks, and model inversion. The framework's novel approach combines advanced machine learning techniques with robust security measures, ensuring compliance with Australian regulations while enhancing the reliability and trustworthiness of GenAI systems. This research contributes to the field of intelligent systems by providing actionable strategies for secure GenAI implementation in industry, fostering innovation in AI applications, and safeguarding national interests.


On Ambiguity and the Expressive Function of Law: The Role of Pragmatics in Smart Legal Ecosystems

arXiv.org Artificial Intelligence

This is a long paper, an essay, on ambiguity, pragmatics, legal ecosystems, and the expressive function of law. It is divided into two parts and fifteen sections. The first part (Pragmatics) addresses ambiguity from the perspective of linguistic and cognitive pragmatics in the legal field. The second part (Computing) deals with this issue from the point of view of human-centered design and artificial intelligence, specifically focusing on the notion and modelling of rules and what it means to comply with the rules. This is necessary for the scaffolding of smart legal ecosystems (SLE). I will develop this subject with the example of the architecture, information flows, and smart ecosystem of OPTIMAI, an EU project of Industry 4.0 for zero-defect manufacturing (Optimizing Manufacturing Processes through Artificial Intelligence and Virtualization).


Making deepfake images is increasingly easy – controlling their use is proving all but impossible

The Guardian

"Very creepy," was April's first thought when she saw her face on a generative AI website. April is one half of the Maddison twins. She and her sister Amelia make content for OnlyFans, Instagram and other platforms, but they also existed as a custom generative AI model – made without their consent. "It was really weird to see our faces, but not really our faces," she says. Deepfakes – the creation of realistic but false imagery, video and audio using artificial intelligence – is on the political agenda after the federal government announced last week it would introduce legislation to ban the creation and sharing of deepfake pornography as part of measures to combat violence against women.


Automating Thematic Analysis: How LLMs Analyse Controversial Topics

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support'sensemaking' - making sense of a complex environment or subject - by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support thematic analysis of controversial topics. We compare how human researchers and two LLMs (GPT-4 and Llama 2) categorise excerpts from media coverage of the controversial Australian Robodebt scandal. Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents, and suggest where LLMs can be effective in supporting forms of discourse and thematic analysis. We argue LLMs should be used to augment - and not replace - human interpretation, and we add further methodological insights and reflections to existing research on the application of automation to qualitative research methods. We also introduce a novel card-based design toolkit, for both researchers and practitioners to further interrogate LLMs as analytical tools.